15,931 research outputs found

    Online and offline heuristics for inferring hierarchies of repetitions in sequences

    Get PDF
    Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique

    Detecting sequential structure

    Get PDF
    Programming by demonstration requires detection and analysis of sequential patterns in a user’s input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided

    Steady-state, effective-temperature dynamics in a glassy material

    Full text link
    We present an STZ-based analysis of numerical simulations by Haxton and Liu (HL). The extensive HL data sharply test the basic assumptions of the STZ theory, especially the central role played by the effective disorder temperature as a dynamical state variable. We find that the theory survives these tests, and that the HL data provide important and interesting constraints on some of its specific ingredients. Our most surprising conclusion is that, when driven at various constant shear rates in the low-temperature glassy state, the HL system exhibits a classic glass transition, including super-Arrhenius behavior, as a function of the effective temperature.Comment: 9 pages, 6 figure

    Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data

    Get PDF
    There are two broad classes of models used to address the econometric problems caused by skewness in data commonly encountered in health care applications: (1) transformation to deal with skewness (e.g., OLS on ln(y)); and (2) alternative weighting approaches based on exponential conditional models (ECM) and generalized linear model (GLM) approaches. In this paper, we encompass these two classes of models using the three parameter generalized gamma (GGM) distribution, which includes several of the standard alternatives as special cases OLS with a normal error, OLS for the log normal, the standard gamma and exponential with a log link, and the Weibull. Using simulation methods, we find the tests of identifying distributions to be robust. The GGM also provides a potentially more robust alternative estimator to the standard alternatives. An example using inpatient expenditures is also analyzed.

    Extracting text from PostScript

    Get PDF
    We show how to extract plain text from PostScript files. A textual scan is inadequate because PostScript interpreters can generate characters on the page that do not appear in the source file. Furthermore, word and line breaks are implicit in the graphical rendition, and must be inferred from the positioning of word fragments. We present a robust technique for extracting text and recognizing words and paragraphs. The method uses a standard PostScript interpreter but redefines several PostScript operators, and simple heuristics are employed to locate word and line breaks. The scheme has been used to create a full-text index, and plain-text versions, of 40,000 technical reports (34 Gbyte of PostScript). Other text-extraction systems are reviewed: none offer the same combination of robustness and simplicity

    Scaling and Universality in the Counterion-Condensation Transition at Charged Cylinders

    Full text link
    We address the critical and universal aspects of counterion-condensation transition at a single charged cylinder in both two and three spatial dimensions using numerical and analytical methods. By introducing a novel Monte-Carlo sampling method in logarithmic radial scale, we are able to numerically simulate the critical limit of infinite system size (corresponding to infinite-dilution limit) within tractable equilibration times. The critical exponents are determined for the inverse moments of the counterionic density profile (which play the role of the order parameters and represent the inverse localization length of counterions) both within mean-field theory and within Monte-Carlo simulations. In three dimensions (3D), correlation effects (neglected within mean-field theory) lead to an excessive accumulation of counterions near the charged cylinder below the critical temperature (condensation phase), while surprisingly, the critical region exhibits universal critical exponents in accord with the mean-field theory. In two dimensions (2D), we demonstrate, using both numerical and analytical approaches, that the mean-field theory becomes exact at all temperatures (Manning parameters), when number of counterions tends to infinity. For finite particle number, however, the 2D problem displays a series of peculiar singular points (with diverging heat capacity), which reflect successive de-localization events of individual counterions from the central cylinder. In both 2D and 3D, the heat capacity shows a universal jump at the critical point, and the energy develops a pronounced peak. The asymptotic behavior of the energy peak location is used to locate the critical temperature, which is also found to be universal and in accordance with the mean-field prediction.Comment: 31 pages, 16 figure
    • 

    corecore